类型¶

基本环境¶

名字空间¶

Python 中，每个模块（源码文件）有一个全局名字空间，根据代码作用域，有当前名字（本地名字）空间，如果直接在模块级别执行，本地名字空间和全局名字空间没有区别，但在函数内，当前名字空间指函数作用域指的是函数作用域

In [1]:

x = 100
print(id(globals))
print(id(locals))

140669702279960
140669702280680

In [2]:

globals()

Out[2]:

{'In': ['', 'x = 100\nprint(id(globals))\nprint(id(locals))', 'globals()'],
 'Out': {},
 '_': '',
 '__': '',
 '___': '',
 '__builtin__': <module 'builtins' (built-in)>,
 '__builtins__': <module 'builtins' (built-in)>,
 '__doc__': 'Automatically created module for IPython interactive environment',
 '__loader__': None,
 '__name__': '__main__',
 '__package__': None,
 '__spec__': None,
 '_dh': ['/home/kaka/blog/my_blog/content/python3-note'],
 '_i': 'x = 100\nprint(id(globals))\nprint(id(locals))',
 '_i1': 'x = 100\nprint(id(globals))\nprint(id(locals))',
 '_i2': 'globals()',
 '_ih': ['', 'x = 100\nprint(id(globals))\nprint(id(locals))', 'globals()'],
 '_ii': '',
 '_iii': '',
 '_oh': {},
 'exit': <IPython.core.autocall.ZMQExitAutocall at 0x7ff02dab05f8>,
 'get_ipython': <bound method InteractiveShell.get_ipython of <ipykernel.zmqshell.ZMQInteractiveShell object at 0x7ff02dae6cf8>>,
 'quit': <IPython.core.autocall.ZMQExitAutocall at 0x7ff02dab05f8>,
 'x': 100}

In [4]:

def test():
    x = "hello world"
    print(locals())
    print(id(globals))
    print(id(locals))
    
test()

{'x': 'hello world'}
140669702279960
140669702280680

所以，我们可以直接修改名字空间建立关联引用

In [5]:

globals()["hello"] = "hello world"
hello

Out[5]:

'hello world'

并非所有时候都能直接操作名字空间，函数执行使用缓存机制，直接修改本地名字空间未必有效。正常编码时候尽量避免直接修改名字空间

在名字空间字典中，名字只是简单的字符串主键，所以，名字可以重新关联另一个对象，不用在乎类型是否相同

In [6]:

x = 100
print(id(x))
x = "hello"  # 重新关联对象，而不是修改原对象
print(id(x))

140669700048672
140669508307464

一个对象可以用多个名字

In [7]:

x = 1234
y = x
y is x # 必须用 is 判断是否引用同一个对象，因为相等操作符是可以重载的，有时候只判断值

Out[7]:

True

命名规则建议:

- 类型名称使用 CapWords 格式
- 模块文件名、函数、方法成员等使用 lower_case_with_underscores 格式
- 全局变量使用 UPPER_CASE_WITH_UNDERSCORES 格式
- 避免与内置函数或标准库的常用类型同名，以免造成误解

以下划线开头的名字，代表特殊含义:

- 模块成员以单下划线开头 `(_x)`，属于私有成员，不会被星号导入
- 类型成员以双下划线开头,但无结尾 `(__x)` 属于自动命名私有成员
- 以双下划线开头和结尾 `(__x__)` 通常是系统成员，应避免使用
- 交互模式下，单下划线 `(_)` 返回最后一个表达式结果

In [9]:

1 + 2 + 3

Out[9]:

In [11]:

Out[11]:

强引用¶

In [15]:

import sys  
a = 1234
b = a
print(sys.getrefcount(a)) # getrefcount() 也会通过参数引用目标对象，导致引用计数 +1
del a
print(sys.getrefcount(b))

4
3

弱引用¶

弱引用（weak reference）在保留引用前提下，不增加计数也不阻止目标被回收(int tuple 等不支持弱引用）

In [26]:

import weakref

class X:
    def __del__(self):
        print(id(self), "dead.")
        
c = X()
sys.getrefcount(c)

Out[26]:

In [27]:

w = weakref.ref(c)
w() is c

Out[27]:

True

In [28]:

sys.getrefcount(c)

Out[28]:

In [29]:

del c

140669507549728 dead.

In [30]:

w() is None

Out[30]:

True

弱引用经常用来缓存，监控等 “外挂” 场景，不影响目标对象，也不能阻止它们被回收，弱引用另一个典型应用是实现 Finalizer，也就是在对象被回收时执行额外的 “清理” 操作

In [36]:

d = X()
def callback(w):
    print(w, w() is None)

w = weakref.ref(d, callback) # 创建弱引用时设置回调函数

In [37]:

del d

140669397326592 dead.
<weakref at 0x7ff0257547c8; dead> True

这里不用析构方法的原因是析构函数作为目标成员，用途是完成对象的内部资源清理，它并应该处理与之无关的外部场景，所有用 Finalizer 是一个合理的选择

弱引用与普通名字最大区别在于类函数的调用和语法。可以用 proxy 改进，使其和名字引用语法保持一致

In [38]:

a = X()
a.name = "kaka"
w = weakref.ref(a)
w.name

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-38-86b23ebf3104> in <module>()
      2 a.name = "kaka"
      3 w = weakref.ref(a)
----> 4 w.name

AttributeError: 'weakref' object has no attribute 'name'

In [39]:

w().name

Out[39]:

'kaka'

In [40]:

p = weakref.proxy(a)
p

Out[40]:

<__main__.X at 0x7ff02575cb88>

In [41]:

p.name

Out[41]:

'kaka'

In [43]:

p.age = 60 # 可以直接赋值
p.age

Out[43]:

对象复制¶

浅拷贝复制复制名字引用，深拷贝复制所有引用成员

In [45]:

class X: pass
x = X()
x.data = [1, 2]

import copy 

x2 = copy.copy(x)
x2 is x

Out[45]:

False

In [46]:

x2.data is x.data # 成员 data 仍然指向原列表，仅仅复制了引用

Out[46]:

True

In [47]:

x3 = copy.deepcopy(x)
x3 is x

Out[47]:

False

In [48]:

x3.data is x.data

Out[48]:

False

循环引用垃圾回收¶

In [54]:

class X: 
    def __del__(self): 
        print(self, "dead.")
        
import gc
gc.disable() # 在性能测试时，要关闭 gc， 避免垃圾回收对执行器计时造成影响
a = X()
b = X()
a.x = b
b.x = a # 构建循环引用

del a
del b # 删除所有名字后对象并未回收，引用计数失效

gc.enable()
gc.collect()

<__main__.X object at 0x7ff02c12ee48> dead.
<__main__.X object at 0x7ff02c12e978> dead.

Out[54]:

编译¶

除了交互模式和手工编译，源码在被导入（import）时完成编译，编译后的字节码数据被缓存复用，通常还会保存到硬盘

Python 3 使用专门目录保存字节码缓存文件(__pycache__/*.pyc) 这样程序在下次启动时，可以避免再次编译，提升导入速度。缓存文件头中存储了编译信息，用来判断源码文件是否被更新

除了作为执行指令的字节码外，还有很多元数据，共同组成执行单元。从这些元数据中，可以获得参数，闭包等信息

In [1]:

def  add(x, y):
    return x + y

add.__code__

Out[1]:

<code object add at 0x7f7f240d4b70, file "<ipython-input-1-fae8dc1acce6>", line 1>

In [2]:

dir(add.__code__)

Out[2]:

['__class__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__le__',
 '__lt__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'co_argcount',
 'co_cellvars',
 'co_code',
 'co_consts',
 'co_filename',
 'co_firstlineno',
 'co_flags',
 'co_freevars',
 'co_kwonlyargcount',
 'co_lnotab',
 'co_name',
 'co_names',
 'co_nlocals',
 'co_stacksize',
 'co_varnames']

In [3]:

add.__code__.co_varnames

Out[3]:

('x', 'y')

In [4]:

add.__code__.co_code

Out[4]:

b'|\x00\x00|\x01\x00\x17S'

我们无法直接阅读机器码，可以反编译

In [5]:

import dis

dis.dis(add)

  2           0 LOAD_FAST                0 (x)
              3 LOAD_FAST                1 (y)
              6 BINARY_ADD
              7 RETURN_VALUE

某些时候，需要手工编译

In [8]:

source = """
print("hello, world")
print(1 + 2)
"""
code = compile(source, "demo", "exec") # 提供一个文件名用于输出提示
dis.show_code(code)

Name:              <module>
Filename:          demo
Argument count:    0
Kw-only arguments: 0
Number of locals:  0
Stack size:        3
Flags:             NOFREE
Constants:
   0: 'hello, world'
   1: 1
   2: 2
   3: None
   4: 3
Names:
   0: print

In [9]:

dis.dis(code)

  2           0 LOAD_NAME                0 (print)
              3 LOAD_CONST               0 ('hello, world')
              6 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
              9 POP_TOP

  3          10 LOAD_NAME                0 (print)
             13 LOAD_CONST               4 (3)
             16 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             19 POP_TOP
             20 LOAD_CONST               3 (None)
             23 RETURN_VALUE

In [11]:

import py_compile, compileall

path = '/home/kaka/kaka/udacity/1-LaneLines/tune.py'
py_compile.compile(path)

Out[11]:

'/home/kaka/kaka/udacity/1-LaneLines/__pycache__/tune.cpython-35.pyc'

In [12]:

compileall.compile_dir('.') # python -m compileall .

Listing '.'...
Listing './.ipynb_checkpoints'...

Out[12]:

执行¶

程序可以再运行期间动态执行 “未知” 代码，常用于实现动态生成的设计，例如 namedtuple 可以在运行期间构建新的类型

In [14]:

import collections

User = collections.namedtuple("User", "name, age", verbose=True)

from builtins import property as _property, tuple as _tuple
from operator import itemgetter as _itemgetter
from collections import OrderedDict

class User(tuple):
    'User(name, age)'

    __slots__ = ()

    _fields = ('name', 'age')

    def __new__(_cls, name, age):
        'Create new instance of User(name, age)'
        return _tuple.__new__(_cls, (name, age))

    @classmethod
    def _make(cls, iterable, new=tuple.__new__, len=len):
        'Make a new User object from a sequence or iterable'
        result = new(cls, iterable)
        if len(result) != 2:
            raise TypeError('Expected 2 arguments, got %d' % len(result))
        return result

    def _replace(_self, **kwds):
        'Return a new User object replacing specified fields with new values'
        result = _self._make(map(kwds.pop, ('name', 'age'), _self))
        if kwds:
            raise ValueError('Got unexpected field names: %r' % list(kwds))
        return result

    def __repr__(self):
        'Return a nicely formatted representation string'
        return self.__class__.__name__ + '(name=%r, age=%r)' % self

    def _asdict(self):
        'Return a new OrderedDict which maps field names to their values.'
        return OrderedDict(zip(self._fields, self))

    def __getnewargs__(self):
        'Return self as a plain tuple.  Used by copy and pickle.'
        return tuple(self)

    name = _property(_itemgetter(0), doc='Alias for field number 0')

    age = _property(_itemgetter(1), doc='Alias for field number 1')

In [15]:

User

Out[15]:

__main__.User

In [16]:

u = User("kaka", 30)
u

Out[16]:

User(name='kaka', age=30)

且不管代码如何生成，最终都要以模块导入执行，要么调用 eval，exec 函数执行。eval 执行单个表达式，exec 对应代码块执行，接受字符串或已编译好的代码对象(code) 作为参数。如果是字符串，就会检查是否符合语法规则

In [17]:

s = "1 + 2 + 3"
eval(s)

Out[17]:

In [18]:

s = """
def test():
    print("hello world")
test()
"""
exec(s)

hello world

无论哪种方式，都必须有对应的上下文环境，默认直接使用 当前全局和本地名字空间

In [19]:

x = 100
def test ():
    y = 200
    print(eval("x + y")) # 从上下文空间获取 x, y
    
test()

In [23]:

def test():
    print("test:", id(globals), id(locals))
    exec('print("exec:", id(globals), id(locals))')
    
test()

test: 140184209593112 140184209593832
exec: 140184209593112 140184209593832

有了操作上下文名字空间能力，我们就可以向外部环境注入新的成员，新的类型算法等等。最终达到动态逻辑或结果融入，成为当前体系组成的设计目标

In [25]:

s = """
class X: pass
def hello():
    print("hello, world")
"""
exec(s)

In [26]:

Out[26]:

__main__.X

In [27]:

X()

Out[27]:

<__main__.X at 0x7f7f20130860>

In [28]:

hello()

hello, world

某些时候，动态代码来源不确定，基于安全考虑，必须对执行过程进行隔离，阻止其直接读写环境数据。如此，就必须传入容器对象作为动态代码的专用名字空间，以类似简易沙箱(sandbox)的方式执行

根据需要，分别提供 globals，locals参数，也可共用同一空间字典

为保证代码正确执行，解释器会自动导入 __builtins__ 模块。以便导入内置函数

In [29]:

g = {"x": 100}
l = {"y": 200} 
eval("x+y", g, l) # 为 globals 和  locals 分别指定字典

Out[29]:

In [31]:

ns = { }
exec("class X: pass", ns) # globals 和 locals 共用一个字典
# ns 太多了，不打印了

同时提供两个名字空间参数时，默认总是 locals 优先，除非在动态代码中明确指定使用 globals

In [35]:

s = """
print(x) # locals
global y # globals
y += 100

z = x + y # locals
"""

g = {"x": 10, "y": 20}
l = {"x": 1000}
exec(s, g, l)

在函数作用域内，locals 函数总是返回执行栈帧（stack frame) 名字空间。因此，即便显示提供 locals 名字看完我，也无法将其注入到动态代码中

In [38]:

s = """
print(id(locals()))
def test():
    print(id(locals()))
test()
"""
ns = {}
id(ns)

Out[38]:

140183978264840

In [40]:

exec(s, ns, ns) # test.locals() 和 ns.locals() 不同

140183978264840
140183975734536

内置类型¶

整数¶

对于较长数字，为了方便阅读，习惯用千分位分隔，但逗号在 Python 中有特殊的含义，所以用下划线表示，且不限分割位数

In [2]:

78_654_321 # python 3.6 才支持

Out[2]:

78654321

In [3]:

0b110011 # 二进制 0b 开头

Out[3]:

In [4]:

0o12 # 八进制以 0o 或者 0O 开头

Out[4]:

In [8]:

0x64 # 16 进制

Out[8]:

In [9]:

0b_11001_1

Out[9]:

转换

In [10]:

bin(100) # 10 进制转 2 进制

Out[10]:

'0b1100100'

In [11]:

oct(100) # 10 进制转 8 进制

Out[11]:

'0o144'

In [12]:

hex(100) # 10 进制转 16 进制

Out[12]:

'0x64'

int 函数默认为十进制，会忽略空格，制表符等空白符，如果指定进制，可忽略相关进制前缀

In [13]:

int("0b1100100", 2)

Out[13]:

In [14]:

int("0o144", 8)

Out[14]:

In [15]:

int("0x64", 16)

Out[15]:

In [17]:

int("64", 16) # 忽略进制前缀

Out[17]:

In [18]:

eval("0o144") # 用 eval 完成进制转换，但是效率差些

Out[18]:

将整数转换成字节数组，常用于二进制网络协议和文件读写，需要指定字节序，也就是大小端

In [25]:

import sys

x = 0x1234
n = (x.bit_length() + 8 - 1) // 8 # 按 8 位对齐所需的字节数
b = x.to_bytes(n, sys.byteorder)
b.hex(), type(b)

Out[25]:

('3412', bytes)

In [26]:

hex(int.from_bytes(b, sys.byteorder))

Out[26]:

'0x1234'

运算符¶

In [28]:

3 / 2 # Python 3.6 中两个整数相除结果是 float

Out[28]:

1.5

In [29]:

4 / 2

Out[29]:

2.0

In [30]:

3 // 2 # Floor Division 会截掉小数部分

Out[30]:

In [31]:

5 % 2 # 取余

Out[31]:

In [32]:

divmod(5, 2) # 取余的另一种方法

Out[32]:

(2, 1)

In [34]:

1 > "" # Python 3 不再支持数字和非数字的类型比较操作|

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-34-f899ebaf7f6e> in <module>()
----> 1 1 > "" # Python 3 不再支持数字和非数字的类型比较操作|

TypeError: '>' not supported between instances of 'int' and 'str'

In [35]:

issubclass(bool, int) # bool 是 整数 的子类型，可以直接当做数字使用

Out[35]:

True

In [36]:

isinstance(True, int)

Out[36]:

True

In [37]:

True == 1

Out[37]:

True

In [38]:

True + 1

Out[38]:

在进行 bool 转换时，数字 0，None, 空序列和空字典都被视为 False，反之为 True, 如果是自定义类型，可以通过重写 __bool__ 或者 __len__ 方法影响转换结果

In [42]:

data = (0, 0.0, None, "", list(), tuple(), dict(), set(), frozenset())
any(map(bool, data))

Out[42]:

False

枚举¶

In [43]:

import enum
Color = enum.Enum("Color", "BLACK YELLOW BLUE RED")
isinstance(Color.BLACK, Color)

Out[43]:

True

In [44]:

list(Color)

Out[44]:

[<Color.BLACK: 1>, <Color.YELLOW: 2>, <Color.BLUE: 3>, <Color.RED: 4>]

In [46]:

class X(enum.Enum): #通过继承，枚举可以使任意类型
    A = "a"         # 枚举名字唯一
    B = 100
    C = [1, 2, 3]
    
X.C

Out[46]:

<X.C: [1, 2, 3]>

In [48]:

print(X.B.name) # 每个枚举值都有 name 和 value
print(X.B.value)
print(X["B"])
print(X([1, 2, 3]))

B
100
X.B
X.C

In [51]:

class X(enum.Enum):
    A = 1
    B = 1

print(X.A)
X(1) # 返回第一个定义项，要是想避免值相同的枚举定义，可以用 enum.unique

X.A

Out[51]:

<X.A: 1>

内存¶

对于常用的小数字，解释器会在初始化的时候进行预缓存。稍后使用，直接将名字与这些缓存对象关联即可。可以提高性能，Python 3.6 预缓存范围是 [-5, 256]

In [52]:

a = -5
b = -5
a is b

Out[52]:

True

In [53]:

a = 256
b = 256
a is b

Out[53]:

True

In [54]:

a = -6 # 超过缓存，每次都要新建对象，这包括了内存分配等操作
b = -6 
a is b

Out[54]:

False

In [55]:

a = 257
b = 257
a is b

Out[55]:

False

浮点数¶

默认浮点数类型仅存储双精度 (double) 浮点数，可表达 16 到 17 个小数位, 从实现方式看，浮点数以二进制存储十进制数的近似值，这可能导致执行结果与编码预期不符，造成不一致的缺陷。所以，对精度有严格要求的场合，应该选择固定精度

In [58]:

1 / 3

Out[58]:

0.3333333333333333

可以通过 float.hex 方法输出实际存储的 16 进制格式字符串，以检查执行结果为何不同，还可以用这种方式实现浮点数值的精确传递，避免精度丢失

In [59]:

0.1 * 3 == 0.3

Out[59]:

False

In [60]:

(0.1 * 3).hex()

Out[60]:

'0x1.3333333333334p-2'

In [61]:

(0.3).hex()

Out[61]:

'0x1.3333333333333p-2'

In [63]:

s = (1 / 3).hex()
float.fromhex(s) # 反向转成浮点数

Out[63]:

0.3333333333333333

In [64]:

round(0.1 * 3, 2) == round(0.3, 2) # 使用 round 固定精度，更精确的做法是用 decimal.Decimal 类型

Out[64]:

True

In [65]:

round(0.1, 2) * 3 == round(0.3, 2) # 用 round 返回值做操作数，又损失了精度

Out[65]:

False

In [66]:

float(100)

Out[66]:

100.0

In [67]:

float("-100.23")

Out[67]:

-100.23

In [68]:

float("\t 100.123 \n") # 与 int 类似，可以处理空白符

Out[68]:

100.123

In [69]:

float("1.23e45") # 科学计数法

Out[69]:

1.23e+45

In [71]:

int(2.6), int(-2.6) # 向 0 截小数

Out[71]:

(2, -2)

In [72]:

from math import trunc, floor, ceil

trunc(2.6), trunc(-2.6) # 截断小数

Out[72]:

(2, -2)

In [73]:

floor(2.6), floor(-2.6) # 地板截（向下）

Out[73]:

(2, -3)

In [74]:

ceil(2.6), ceil(-2.6) # 向上截

Out[74]:

(3, -2)

与 float 基于硬件的二进制浮点数类型相比， decimal.Decimal 是十进制实现，最高可提供 28 位有效精度，能准确的表达十进制数和运算，不存在二进制近似问题

In [75]:

1.1 + 2.2

Out[75]:

3.3000000000000003

In [76]:

(0.1 + 0.1 + 0.1 - 0.3) == 0

Out[76]:

False

In [77]:

from decimal import Decimal
Decimal("1.1") + Decimal("2.2")

Out[77]:

Decimal('3.3')

In [78]:

(Decimal("0.1") + Decimal("0.1") + Decimal("0.1") - Decimal("0.3")) == 0

Out[78]:

True

在创建 Decimal 实例时，应该传入一个准确数值，比如整数或字符串等，如果是 float 类型，那么构建之前，精度就已经丢失

In [79]:

Decimal(0.1)

Out[79]:

Decimal('0.1000000000000000055511151231257827021181583404541015625')

In [80]:

Decimal("0.1")

Out[80]:

Decimal('0.1')

In [81]:

from decimal import Decimal, getcontext
getcontext()

Out[81]:

Context(prec=28, rounding=ROUND_HALF_EVEN, Emin=-999999, Emax=999999, capitals=1, clamp=0, flags=[FloatOperation], traps=[InvalidOperation, DivisionByZero, Overflow])

In [83]:

getcontext().prec = 2 # 修改默认的 28 位精度
Decimal(1) / Decimal(3)

Out[83]:

Decimal('0.33')

In [85]:

from decimal import localcontext
with localcontext() as ctx: # 在一段范围内修改精度
    ctx.prec = 5
    print(getcontext().prec)
    print(Decimal(1) / Decimal(3))

5
0.33333

In [87]:

round(0.5) # 因为近似值和精度问题，造成对 float 进行四舍五入的操作会有些问题

Out[87]:

In [88]:

round(1.5)

Out[88]:

In [90]:

from decimal import Decimal, ROUND_HALF_UP # 可以用 Decimal 避免这种问题

def roundx(x, n):
    return Decimal(x).quantize(Decimal(n), ROUND_HALF_UP) # 严格按照 四舍五入 进行

In [91]:

roundx("1.24", ".1")

Out[91]:

Decimal('1.2')

In [92]:

roundx("1.25", ".1")

Out[92]:

Decimal('1.3')

In [93]:

roundx("1.26", ".1")

Out[93]:

Decimal('1.3')

字符串¶

字符串存储 Unicode 文本，是不可变序列类型。UTF 的作用是将码点整数转成计算机可存储的字节格式。UTF-8 与 ASCII 兼容，最常用

In [95]:

s = "汉字"
len(s)

Out[95]:

In [96]:

hex(ord("汉"))

Out[96]:

'0x6c49'

In [97]:

chr(0x6c49)

Out[97]:

'汉'

In [98]:

ascii("汉字")

Out[98]:

"'\\u6c49\\u5b57'"

In [99]:

"h\x69, \u6C49\U00005B57" # 大 U 和 小 u 分别表示 32 位和 16 位 整数

Out[99]:

'hi, 汉字'

In [100]:

type(u"abc") # 默认 str 就是 Unicode 不用加 前缀

Out[100]:

str

In [101]:

type(b"abc") # 字节数组

Out[101]:

bytes

In [104]:

import dis 
def test():
    a = "x" + "y" + "z" # 编译期间就算出了结果
    b = "a" * 10
    return a, b

dis.dis(test)

  3           0 LOAD_CONST               7 ('xyz')
              2 STORE_FAST               0 (a)

  4           4 LOAD_CONST               8 ('aaaaaaaaaa')
              6 STORE_FAST               1 (b)

  5           8 LOAD_FAST                0 (a)
             10 LOAD_FAST                1 (b)
             12 BUILD_TUPLE              2
             14 RETURN_VALUE

多个字符串动态拼接，优先使用 join 和 format

join 函数可以预先计算总长度，一次性分配，随后直接复制内存数据填充。另一方面，将固定模板内容与变量分离的 format 更容易阅读和维护

In [105]:

username = "kaka"
datetime = "2018"

tmp = "/data/{user}/message/{time}.txt"
tmp.format(user=username, time=datetime)

Out[105]:

'/data/kaka/message/2018.txt'

In [106]:

s = "-" * 1024
s1 = s[10:1000]
s2 = s[:]
s3 = s.split(",")[0] # 内容相同

In [107]:

s1 is s

Out[107]:

False

In [108]:

s2 is s

Out[108]:

True

In [109]:

s3 is s

Out[109]:

True

In [110]:

s = "汉字"
b = s.encode("utf-16")
b.decode("utf-16")

Out[110]:

'汉字'

处理 BOM 信息可以导入 codecs 模块

In [112]:

s = "汉字"
s.encode("utf-16").hex()

Out[112]:

'fffe496c575b'

In [114]:

import codecs
codecs.BOM_UTF16_LE.hex() # BOM 标志

Out[114]:

'fffe'

In [115]:

codecs.encode(s, "utf-16be").hex() # BOM 转换

Out[115]:

'6c495b57'

In [116]:

codecs.encode(s, "utf-16le").hex()

Out[116]:

'496c575b'

In [117]:

sys.getdefaultencoding() # Python 3 的默认编码不是 ASCII，无需额外设置

Out[117]:

'utf-8'

Python 3.6 新增了 f-strings 支持，这在多数脚本语言属于标配，使用 f 前缀，解析器解析大括号内的字段和表达式，在上下文名字空间查找同名对象进行替换，格式化控制仍旧遵循 format 规范，但阅读体验更好

In [118]:

x = 10
y = 20
f"{x} + {y} = {x + y}"

Out[118]:

'10 + 20 = 30'

In [119]:

"{} + {} = {}".format(x, y, x+y)

Out[119]:

'10 + 20 = 30'

In [120]:

f"{type(x)}" # 除了运算符外，还可以是函数调用

Out[120]:

"<class 'int'>"

In [121]:

"{0} {1} {0}".format("a", 10) # 手工序号

Out[121]:

'a 10 a'

In [126]:

"{} {}".format("a", 10) # 自动序号

Out[126]:

'a 10'

In [122]:

"{x} {y}".format(x = 100, y = [1, 2, 3]) # 主键

Out[122]:

'100 [1, 2, 3]'

In [124]:

class X: name = "admin"

x = X()
x.name = "jack"
"{0.name}".format(x) # 属性

Out[124]:

'jack'

In [125]:

"{0[2]}".format([1, 2, 3, 4]) # 索引

Out[125]:

'3'

In [127]:

"{0:#08b}".format(5) # 宽度，补位

Out[127]:

'0b000101'

In [129]:

"{:06.2f}".format(1.234) # 保留两位小数

Out[129]:

'001.23'

In [130]:

"{:,}".format(1234567) # 千分位

Out[130]:

'1,234,567'

In [132]:

"[{:^10}]".format("abc") # 居中对齐

Out[132]:

'[   abc    ]'

池化

字符串相同可以共享实例，因为它们内容相同，又不可变，所以共享没有问题，比较时候不需要额外计算，只要比指针，效率高

In [134]:

import sys
"__name__" is sys.intern("__name__")

Out[134]:

True

In [144]:

a = "hello, world!"
b = "hello, world!"
a is b # 不同实例

Out[144]:

False

In [145]:

sys.intern(a) is sys.intern("hello, world!") # 相同实例

Out[145]:

True

字节数组¶

当我们谈论字节序列时候，更关心的存储和传输方式，面向类型时，更关心抽象属性

In [148]:

print(b"abc")
print(bytes("汉字", "utf-8"))

b'abc'
b'\xe6\xb1\x89\xe5\xad\x97'

In [149]:

a = b"abc"
b = a + b"def"
b.startswith(b"abc")

Out[149]:

True

In [150]:

b.upper()

Out[150]:

b'ABCDEF'

bytes 是一次性内存分配，bytearray 可以按需扩张，更适合作为可读写缓冲区使用。如果有必要还可以为其提前分配足够的内存，避免中途扩张造成额外损耗

In [156]:

b = bytearray(b"ab")
len(b)

Out[156]:

In [157]:

b.append(ord("c"))
b.extend(b"de")
b

Out[157]:

bytearray(b'abcde')

In [158]:

b"abc" + b"123" # 加法操作

Out[158]:

b'abc123'

In [159]:

b"abc" * 2 # 乘法操作

Out[159]:

b'abcabc'

内存视图可以直接引用字节数据的某个片段，支持的类型有 bytes, bytearray, array.array, NumPy 的某些类型等

In [161]:

a = bytearray([0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16])
v = memoryview(a)

In [167]:

x = v[2:5] # 视图片段，改变 x 也会改变原数据
x.hex()

Out[167]:

'121314'

In [163]:

a[3] = 0xee
x.hex()

Out[163]:

'12ee14'

In [164]:

x[1] = 0x13
a

Out[164]:

bytearray(b'\x10\x11\x12\x13\x14\x15\x16')

In [165]:

a = b"\x10\x11"
v = memoryview(a)
v[1] = 0xee # bytes 是不可变类型

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-165-4f5205a802be> in <module>()
      1 a = b"\x10\x11"
      2 v = memoryview(a)
----> 3 v[1] = 0xee # bytes 是不可变类型

TypeError: cannot modify read-only memory

复制视图，可以用 tobytes, tolist 方法。复制后的数据与原对象无关，同样不会影响视图自身

In [168]:

a = bytearray([0x10, 0x11, 0x12,  0x13, 0x14, 0x15, 0x16])
v = memoryview(a)
x = v[2:5]
b = x.tobytes() # 复制视图
b

Out[168]:

b'\x12\x13\x14'

In [169]:

a[3] = 0xee
b

Out[169]:

b'\x12\x13\x14'

列表¶

列表内部由两部分组成，保存元素数量和内存分配计数的头部，以及存储指针的独立数组。所有元素项使用该数组保存指针引用，并不嵌入实际内容

In [170]:

list("abc")

Out[170]:

['a', 'b', 'c']

In [171]:

list(range(3))

Out[171]:

[0, 1, 2]

In [172]:

[x + 1 for x in range(6) if x % 2 == 0] # 列表推倒式

Out[172]:

[1, 3, 5]

如果实现自定义列表，推荐基于 collections.UserList 包装类完成，除了统一 collections.abc 体系外，最重要的是该类型重载并完善了相关运算符算法

In [174]:

import collections
print(list.__bases__)
print(collections.UserList.__bases__)

(<class 'object'>,)
(<class 'collections.abc.MutableSequence'>,)

In [175]:

class A(list): pass # 对比不同继承结果

type(A("abc") + list("de")) # 返回的是 list 不是 A

Out[175]:

list

In [176]:

class B(collections.UserList): pass

type(B("abc") + list("de")) # 返回 B 类型

Out[176]:

__main__.B

In [177]:

a = [1, 2]
b = a
a = a + [3, 4] # 不修改原对象
print(a)
print(b)

[1, 2, 3, 4]
[1, 2]

In [179]:

a = [1, 2]
b = a
a += [3, 4] # 修改原对象，编译器将 += 操作处理成 INPLACE_ADD 操作，修改原数据，而非新建对象
print(a)
print(b)

[1, 2, 3, 4]
[1, 2, 3, 4]

In [180]:

2 in [1, 2] # 判断元素是否存在习惯使用 in 方法

Out[180]:

True

In [181]:

a = [0, 1, 2, 3, 4, 5]
del a[5] # 删除单个元素
a

Out[181]:

[0, 1, 2, 3, 4]

In [182]:

del a[1:3] # 指定删除范围
a

Out[182]:

[0, 3, 4]

In [185]:

a = [0, 2, 4, 6]
b = a[:2]
a[0] is b[0] # 复制引用，仍然指向同一个对象

Out[185]:

True

In [186]:

a.insert(1, 1) # 对 a 列表操作，不会影响 b
a

Out[186]:

[0, 1, 2, 4, 6]

In [187]:

b # 对列表自身的修改互不影响，对目标元素对象的修改是共享的

Out[187]:

[0, 2]

In [188]:

class User:
    def __init__(self, name, age):
        self.name = name
        self.age = age
        
    def __repr__(self):
        return f"{self.name} {self.age}"
    
users = [User(f"user{i}", i) for i in (3, 1, 0, 2)]
users

Out[188]:

[user3 3, user1 1, user0 0, user2 2]

In [189]:

users.sort(key = lambda u: u.age) # 可以指定排序条件
users

Out[189]:

[user0 0, user1 1, user2 2, user3 3]

In [190]:

d = [3, 0, 2, 1]
sorted(d) # sorted 可以返回排序结果的复制品

Out[190]:

[0, 1, 2, 3]

In [193]:

import bisect
d = [0, 2, 4]
bisect.insort_left(d, 2) # bisect 可以向有序序列插入元素，使用二分查找可用来实现优先级队或一致性哈希算法
d

Out[193]:

[0, 2, 2, 4]

元组一般用来当做列表的只读版本使用

In [194]:

a = tuple([1, "abc"])
a[0] = 2 # 只读

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-194-a4bcaeef266e> in <module>()
      1 a = tuple([1, "abc"])
----> 2 a[0] = 2 # 只读

TypeError: 'tuple' object does not support item assignment

In [195]:

a = (1, )
type(a)

Out[195]:

tuple

In [196]:

b = (1)
type(b)

Out[196]:

int

In [197]:

(1, 2) + (3, 4) # 支持与列表类似的操作，但是不能修改每次都返回新对象

Out[197]:

(1, 2, 3, 4)

In [199]:

a = (1, 2, 3)
b = a
a += (4, 5) # 创建新的 tuple
print(a)
print(b)

(1, 2, 3, 4, 5)
(1, 2, 3)

In [202]:

User = collections.namedtuple("User", "name,age")
issubclass(User, tuple)

Out[202]:

True

In [203]:

u = User("kaka", 26)
u.name, u.age # 字段名访问

Out[203]:

('kaka', 26)

In [205]:

u[0] is  u.name # 序列号访问

Out[205]:

True

数组与列表元组本质区别在于元素单一类型和内容嵌入

In [208]:

import array
a = array.array("b", [0x11,  0x22, 0x33, 0x44])
memoryview(a).hex() # 用内存视图来看，内容嵌入而非指针

Out[208]:

'11223344'

In [209]:

a.array.array("i")
a.append(100)
a.append(1.23) # 只能是单一类型嵌入

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-209-81f00870ec2f> in <module>()
----> 1 a.array.array("i")
      2 a.append(100)
      3 a.append(1.23) # 只能是单一类型嵌入

AttributeError: 'array.array' object has no attribute 'array'

In [211]:

a = array.array("i", [1, 2, 3])
a.buffer_info() # 返回缓冲区内存地址和长度

Out[211]:

(139639737860544, 3)

In [212]:

a.extend(range(100000))
a.buffer_info()

Out[212]:

(30341376, 100003)

字典¶

字典的值可以是任何类型，但是主键必须是哈希类型，列表，集合等不能作为主键使用，即使是元组这种不可变类型，其中的元素也不能引用可变类型元素

In [213]:

issubclass(list, collections.Hashable)

Out[213]:

False

In [214]:

issubclass(int, collections.Hashable)

Out[214]:

True

In [215]:

hash((1, 2, 3))

Out[215]:

2528502973977326415

In [216]:

hash((1, 2, [3, 5]))

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-216-bc5b9806bdf5> in <module>()
----> 1 hash((1, 2, [3, 5]))

TypeError: unhashable type: 'list'

In [217]:

{"a": 1, "b":2} # 用大括号构建

Out[217]:

{'a': 1, 'b': 2}

In [218]:

dict(a = 1, b = 2) # 类型构造

Out[218]:

{'a': 1, 'b': 2}

In [219]:

kvs = (("a", 1), ["b", 2])
dict(kvs)

Out[219]:

{'a': 1, 'b': 2}

In [220]:

dict(zip("abc", range(3)))

Out[220]:

{'a': 0, 'b': 1, 'c': 2}

In [222]:

dict(map(lambda k, v: (k, v+10), "abc", range(3))) # lambda 过滤数据

Out[222]:

{'a': 10, 'b': 11, 'c': 12}

In [223]:

{k: v + 10 for k, v in zip("abc", range(3))} # 推导式处理数据

Out[223]:

{'a': 10, 'b': 11, 'c': 12}

In [224]:

a = {"a": 1}
b = dict(a, b=2) # 在复制 a 基础上增加键值对
b

Out[224]:

{'a': 1, 'b': 2}

In [226]:

c = dict.fromkeys(b, 0) # 仅使用 b 主键，内容设为 0
c

Out[226]:

{'a': 0, 'b': 0}

In [228]:

d = dict.fromkeys(("counter1", "counter2"), 0) # 显式提供主键
d

Out[228]:

{'counter1': 0, 'counter2': 0}

In [230]:

x = dict(a=1)
x["b"]  # 主键不存在，引发异常

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-230-7021c8efe5ab> in <module>()
      1 x = dict(a=1)
----> 2 x["b"]  # 主键不存在，引发异常

KeyError: 'b'

In [231]:

"b" in x # 先判断

Out[231]:

False

In [232]:

x.get("b", 100) # 主键不存在返回默认值 100

Out[232]:

In [233]:

x.get("a", 100) # 主键存在，返回实际内容

Out[233]:

In [238]:

x = {}
x.setdefault("a", 0) # 如果有 a，返回实际内容，否则新增 {a:0} 键值

Out[238]:

In [239]:

Out[239]:

{'a': 0}

In [240]:

x["a"] = 100
x.setdefault("a", 0)

Out[240]:

In [241]:

{"b":2, "a":1} == {"a":1, "b":2} # 可以比较内容是否相同

Out[241]:

True

Python 3 默认以视图关联字典内容，既能避免复制开销，又能同步观察字典变化

In [244]:

x = dict(a = 1, b = 2)
ks = x.keys() # 主键视图
"b" in ks

Out[244]:

True

In [245]:

for k in ks: print(k, x[k]) # 用视图迭代字典

a 1
b 2

视图能同步读取字典内容，却无法修改，且可以选择不同粒度的内容进行传递，如此可将接收方限定为指定模式下的观察员

In [246]:

def test(d): # 传递键值视图 items，只能读取无法修改
    for k, v in d:
        print(k, v)
        
x = dict(a = 1)
d = x.items()
test(d)

a 1

In [247]:

a = dict(a = 1, b = 2)
b = dict(c = 3, b = 2)
ka = a.keys()
kb = b.keys()
ka & kb # 交集  -- 视图支持集合运算

Out[247]:

{'b'}

In [249]:

ka | kb # 并集

Out[249]:

{'a', 'b', 'c'}

In [250]:

ka - kb # 差集

Out[250]:

{'a'}

In [251]:

ka ^ kb # 对称差集，仅在 a 或 b 中出现，也就是 交集 - 并集

Out[251]:

{'a', 'c'}

In [254]:

a = dict(a = 1, b = 2)
b = dict(b = 20, c = 3)
ks = a.keys() & b.keys() # 可以限定条件，a 中必须存在的主键
a.update({k:b[k] for k in ks}) # 将交集结果提取待更新内容
a

Out[254]:

{'a': 1, 'b': 20}

默认字典可以当字典键不存在的时候给字典一个默认值

In [255]:

d = collections.defaultdict(lambda : 100)
d["a"]

Out[255]:

In [256]:

d["b"] += 1
d

Out[256]:

defaultdict(<function __main__.<lambda>>, {'a': 100, 'b': 101})

有序字典，可以明确记录主键抽次插入的次序

In [257]:

d = collections.OrderedDict()
d["z"] = 1
d["a"] = 2
d["x"] = 3
for k, v in d.items(): print(k, v) # 有序排列

z 1
a 2
x 3

计数器(Counter)对不存在的主键返回 0，而不会新增键值

In [259]:

d = collections.Counter()
d["a"] # 单纯访问不会新增键值

Out[259]:

In [261]:

d["b"] += 1
d

Out[261]:

Counter({'b': 2})

链式字典（ChainMap）以单一借口访问多个字典内容，其自身并不存储数据，读操作会按参数顺序依次查找各字典，但修改操作(insert, update, delete)仅会针对第一个字典

In [262]:

a = dict(a = 1, b = 2)
b = dict(b = 20, c = 30)
x = collections.ChainMap(a, b)
x["b"], x["c"]

Out[262]:

(2, 30)

In [263]:

x["b"] = 999 # 更新，在第一字典执行
x["z"] = 888 # 新增，在第一字典执行
x

Out[263]:

ChainMap({'a': 1, 'b': 999, 'z': 888}, {'b': 20, 'c': 30})

链式字典适合设计多层次上下文(context) 结构。合理上下文，要具备两个特征，首先是继承，所有设置可被调用链的后续函数读取。其次修改仅针对当前和后续逻辑，不应该向无关的父级传递

In [266]:

root = collections.ChainMap({"a":1})
child = root.new_child({"b": 200})
child["a"] = 100
child

Out[266]:

ChainMap({'b': 200, 'a': 100}, {'a': 1})

In [267]:

child.parents

Out[267]:

ChainMap({'a': 1})

集合¶

集合存储非重复对象，所谓重复，是指除不是同一对象外，值也不能相等，判重公式: (a is b) OR (hash(a) == hash(b) AND a == b)

In [269]:

a = 1234
b = 1234
a is b # a ，b 内容相同，但不是同一对象

Out[269]:

False

In [270]:

s = {a}
b in s # 使用内容相同的 b 判重

Out[270]:

True

初始化和字典类似，使用大括号，但是初始化数据不是键值对,元素必须是可哈希类型

In [271]:

type({})

Out[271]:

dict

In [272]:

type({"a": 1})

Out[272]:

dict

In [273]:

type({1})

Out[273]:

set

In [274]:

set((1, "a", 1.0))

Out[274]:

{'a', 1}

In [276]:

frozenset(range(3)) # 不可变集合

Out[276]:

frozenset({0, 1, 2})

In [277]:

{x + 1 for x in range(6) if x % 2 == 0}

Out[277]:

{1, 3, 5}

In [279]:

s = {1}
f = frozenset(s) # 可以相互转换
f

Out[279]:

frozenset({1})

In [280]:

set(f)

Out[280]:

{1}

In [281]:

{1, 2} > {2, 1} # 支持大小，相等运算符

Out[281]:

False

In [282]:

{1, 2} == {2, 1}

Out[282]:

True

In [283]:

{1, 2} <= {1, 2, 3} # 子集

Out[283]:

True

In [284]:

{1, 2, 3} >= {1, 2} # 超集

Out[284]:

True

In [285]:

{1, 2} in {1, 2, 3} # 判断是否包含 {1, 2} 这一个元素（把这个集合看成一个元素）

Out[285]:

False

和字典视图一样，集合也有交集，并集，差集，对称差集

In [286]:

{1, 2, 3} & {2, 3, 4} # 交集

Out[286]:

{2, 3}

In [287]:

{1, 2, 3} | {2, 3, 4} # 并集

Out[287]:

{1, 2, 3, 4}

In [288]:

{1, 2, 3} - {2, 3, 4} # 差集

Out[288]:

{1}

In [290]:

{1, 2, 3} ^ {2, 3, 4} # 对称差集

Out[290]:

{1, 4}

In [292]:

x = {1, 2}
x |= {2, 3} # update
x

Out[292]:

{1, 2, 3}

In [294]:

x = {1, 2}
x &= {2, 3} # intersection_update
x

Out[294]:

{2}

In [295]:

x = {2, 1}
x.remove(2)
x.remove(2) # 删除操作可能引发异常

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-295-4cf8e99936c7> in <module>()
      1 x = {2, 1}
      2 x.remove(2)
----> 3 x.remove(2)

KeyError: 2

In [296]:

x.discard(2) # discard 不会引发异常

自定义类型虽然是可哈希类型，但默认实现并不足以完成集合去重操作

In [306]:

class User:
    def __init__(self, uid, name):
        self.uid = uid
        self.name = name
        
issubclass(User, collections.Hashable)

Out[306]:

True

In [307]:

u1 = User(1, "user1")
u2 = User(1, "user1")
s = set()
s.add(u1)
s.add(u2)
s

Out[307]:

{<__main__.User at 0x7f0068875630>, <__main__.User at 0x7f0068875240>}

这里的原因是默认实现的 __hash__ 方法返回随机值，而 __eq__ 仅比较自身。所以需要重载这两个方法

In [309]:

class User:
    def __init__(self, uid, name):
        self.uid = uid
        self.name = name
    def __hash__(self):
        return hash(self.uid)
    def __eq__(self, other):
        return self.uid == other.uid
    
u1 = User(1, "user1")
u2 = User(1, "user1")
s = set()
s.add(u1)
s.add(u2)
s

Out[309]:

{<__main__.User at 0x7f0069309da0>}

In [310]:

u1 in s

Out[310]:

True

In [312]:

u2 in s # 仅检查 uid 字段

Out[312]:

True

Kaka Blog

第二章类型

kaka

类型¶

基本环境¶

名字空间¶

强引用¶

弱引用¶

对象复制¶

循环引用垃圾回收¶

编译¶

执行¶

内置类型¶

整数¶

运算符¶

枚举¶

内存¶

浮点数¶

字符串¶

字节数组¶

列表¶

字典¶

集合¶

Comments

Kaka Blog

第二章 类型

kaka

类型¶

基本环境¶

名字空间¶

强引用¶

弱引用¶

对象复制¶

循环引用垃圾回收¶

编译¶

执行¶

内置类型¶

整数¶

运算符¶

枚举¶

内存¶

浮点数¶

字符串¶

字节数组¶

列表¶

字典¶

集合¶

Comments

第二章类型